ROCm と HIP：GPU 同期のマインドセット変革を学ぶ詳細10章チュートリアル

高性能コンピューティングにおける根本的な転換は、CPU中心のシーケンシャル実行モデルから、CPUがパイプラインを管理し、GPUが独立して動作する非同期プロデューサー・コンシューマー型モデルへの移行です。その核心的な認識は、 GPUは厳密な同期デバイスとして駆動することを意図していないという点です。これを同期的に扱おうとすると、「ストップ・アンド・ウェイト」のボトルネックが発生します。

1. ワークフローのライフサイクル

非同期的な考え方では、開発者は各タスクの完了を待つことはありません。代わりに、 メモリを割り当てて メモリを割り当て、 カーネルを起動し、 カーネルを起動し、 結果をコピーして戻す 非ブロッキングのリクエストをハードウェアキューに投入することで、結果を得ます。

2. ストールの回避

ホストが各操作後に同期を強制される場合、実行ギャップ（CPUとGPU間の通信時間）が性能を支配します。これに対して 非同期性を活用することで、CPUは処理を継続しつつ、GPUは自身のストリームを処理し、ハードウェアの飽和度を最大化できます。

$$\text{合計時間} = \max(\text{CPU作業量}, \text{GPU作業量}) + \text{同期オーバーヘッド}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which set of steps correctly converts a synchronous vector-add to use an explicit stream?

Call hipStreamCreate, use hipMemcpyAsync with the handle, and pass the handle as the 4th kernel argument.

Call hipDeviceSynchronize after every kernel launch and use hipMemcpy.

Set the stream parameter to NULL in all hipMemcpyAsync calls.

Replace hipMalloc with hipHostMalloc exclusively.

QUESTION 2

Why is a GPU considered 'not meant to be driven as a strictly synchronous device'?

Because it has no internal clock.

Because waiting for the CPU to confirm every command leaves thousands of cores idle.

Because memory transfers cannot be tracked by the CPU.

Because the GPU must manage its own power state.

QUESTION 3

What is the primary risk of forcing the host to synchronize after every operation?

Memory corruption.

Host-side stalling and loss of hardware saturation.

Increased power consumption on the GPU.

Kernel compile errors.

QUESTION 4

In the logistics warehouse analogy, what does the 'Conveyor Belt' represent?

A HIP Stream.

The GPU Driver.

The CPU Cache.

The VRAM buffer.

QUESTION 5

True or False: hipMemcpyAsync returns control to the CPU before the data transfer is complete.

True

False